Goto

Collaborating Authors

 consecutive hour


Multi-stage Retrieve and Re-rank Model for Automatic Medical Coding Recommendation

arXiv.org Artificial Intelligence

The International Classification of Diseases (ICD) serves as a definitive medical classification system encompassing a wide range of diseases and conditions. The primary objective of ICD indexing is to allocate a subset of ICD codes to a medical record, which facilitates standardized documentation and management of various health conditions. Most existing approaches have suffered from selecting the proper label subsets from an extremely large ICD collection with a heavy long-tailed label distribution. In this paper, we leverage a multi-stage ``retrieve and re-rank'' framework as a novel solution to ICD indexing, via a hybrid discrete retrieval method, and re-rank retrieved candidates with contrastive learning that allows the model to make more accurate predictions from a simplified label space. The retrieval model is a hybrid of auxiliary knowledge of the electronic health records (EHR) and a discrete retrieval method (BM25), which efficiently collects high-quality candidates. In the last stage, we propose a label co-occurrence guided contrastive re-ranking model, which re-ranks the candidate labels by pulling together the clinical notes with positive ICD codes. Experimental results show the proposed method achieves state-of-the-art performance on a number of measures on the MIMIC-III benchmark.


SOCAIRE: Forecasting and Monitoring Urban Air Quality in Madrid

arXiv.org Artificial Intelligence

Air quality has become one of the main issues in public health and urban planning management, due to the proven adverse effects of high pollutant concentrations. Considering the mitigation measures that cities all over the world are taking in order to face frequent low air quality episodes, the capability of foreseeing future pollutant concentrations is of great importance. Through this paper, we present SOCAIRE, an operational tool based on a Bayesian and spatiotemporal ensemble of neural and statistical nested models. SOCAIRE integrates endogenous and exogenous information in order to predict and monitor future distributions of the concentration for several pollutants in the city of Madrid. It focuses on modeling each and every available component which might play a role in air quality: past concentrations of pollutants, human activity, numerical pollution estimation, and numerical weather predictions. This tool is currently in operation in Madrid, producing daily air quality predictions for the next 48 hours and anticipating the probability of the activation of the measures included in the city's official air quality \no protocols through probabilistic inferences about compound events.


Machine learning models show similar performance to Renewables.ninja for generation of long-term wind power time series even without location information

arXiv.org Machine Learning

Driven by climatic processes, wind power generation is inherently variable. Long-term simulated wind power time series are therefore an essential component for understanding the temporal availability of wind power and its integration into future renewable energy systems. In the recent past, mainly power curve based models such as Renewables.ninja (RN) have been used for deriving synthetic time series for wind power generation despite their need for accurate location information as well as for bias correction, and their insufficient replication of extreme events and short-term power ramps. We assess how time series generated by machine learning models (MLM) compare to RN in terms of their ability to replicate the characteristics of observed nationally aggregated wind power generation for Germany. Hence, we apply neural networks to one MERRA2 reanalysis wind speed input dataset with no location information and one with basic location information. The resulting time series and the RN time series are compared with actual generation. Both MLM time series feature equal or even better time series quality than RN depending on the characteristics considered. We conclude that MLM models can, even when reducing information on turbine locations and turbine types, produce time series of at least equal quality to RN.